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Abstract. The slow processes of metastable stochastic dynamical systems are difficult to access by 
direct numerical simulation due the sampling problem. Here, we suggest an approach for modeling 
the slow parts of Markov processes by approximating the dominant eigenfunctions and eigenvalues 
of the propagator. To this end, a variational principle is derived that is based on the maximization 
of a Rayleigh coefficient. It is shown that this Rayleigh coefficient can be estimated from statistical 
observables that can be obtained from short distributed simulations starting from different parts of 
state space. The approach forms a basis for the development of adaptive and efficient computational 
algorithms for simulating and analyzing metastable Markov processes while avoiding the sampling 
^ 1 problem. Since any stochastic process with finite memory can be transformed into a Markov process, 

Q the approach is applicable to a wide range of processes relevant for modeling complex real-world 

, phenomena. 
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^ ! 1. Introduction 

In this article, we consider continuous-time Markov processes z t € living in a usually large state 
space Q that is either continuous or discrete. The process z t is considered to be sufficiently ergodic 
such that a unique stationary density (invariant measure) exists. Independent of the details of 
the dynamics (such as system size, potential, stochastic coupling, etc), there exists a family of linear 
propagators V{t) which evolve the probability density of states p T as: 
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(1.1) Pt =V(t) P o 



Continuous-time Markov processes are useful models of real- world processes in a variety of areas |44| . 
Examples include macroscopic phenomena including the evolution of financial and climate systems 
j^. ■ [24l [30] , as well as microscopic dynamics such as the diffusion of cells in liquids Q] , the diffusion of 

biomolecules within cells [41] , the stochastic reaction dynamics of chemicals at surfaces [31] , and the 
stochastic dynamics governing the structural dynamics of molecules [2j. Often, these dynamics are 
\ metastable, i.e. they consist of slow processes between sets of state space that have long lifetimes. In 

macromolecules, such slowly-exchanging sets are called conformations, hence the union of the slowest 
dynamical processes are there termed conformation dynamics [351 137] , 

In practice, the slow dynamical processes are the ones which pose the greatest difficulties to direct 
numerical simulation as they require the longest simulation times. An extreme example is the atomistic 
simulation of solvated biomolecules which would require the propagation of a system with 10 4 — 10 6 
particles for 10 7 — 10 10 time-steps, a task that is intractable or hardly tractable even with special- 
purpose supercomputers [33] ■ However, the slowest processes are also the ones that are the most 
interesting in many systems. They often correspond to rare events that change the global structure 
and/or the functional behavior of the system. For example, in macromolecular systems, the slowest 
events often correspond to functional conformational changes such as folding, binding or catalysis 
[551 ESI E HI] • Therefore, a method is sought that models the slow dynamical processes of continuous- 
time Markov processes accurately and ideally in a way that supports the efficient simulation of these 
processes without the need to solve the full direct numerical simulation problem. 

Especially in statistical physics, many theories and methods have been proposed to model slow dy- 
namical processes. Examples are rate theories that describe the passage rate of a process across a 
surface that separates metastable states [TSl HO] , pathway-based theories and methods that describe 
the transition dynamics of a system from a subset A to a subset B of state space [141 [SJ [33] and 
network-based approaches that attempt to coarse-grain the high-dimensional dynamics to a network 
of discrete jump events between sub-states or landmarks |47l 148] , These approaches usually assume a 
separation of timescales between the slow and the fast processes that results from vanishing smallness 
parameters (e.g. noise intensity, temperature). In these cases, the mathematical analysis can be based 
on large deviation estimates and variational principles |16[ 114]. 



A VARIATIONAL APPROACH TO MODELING SLOW PROCESSES IN STOCHASTIC DYNAMICAL SYSTEMS 2 



In this article, we consider a more general approach to describing slow dynamical processes. When the 
operator V is compact and self-adjoint, eq. can be decomposed into the propagator's spectral 

components 



where [i, I2, 13, ... are the propagator's eigenfunctions and Aj(r) = cxp(— k^t) (sorted in non-ascending 
order) are the propagator's real- valued eigenvalues that decay exponentially in time with rates m. 
a i(po) are factors depending on the initial density p$. We here consider the situation that the 
eigenspaces of slow and fast processes are orthogonal. For example, in the important case that the 
system studied is a microscopic physical system in thermal equilibrium, such that the process z* is 
reversible with respect to the invariant density fi, the eigenspaces of slow and fast processes are or- 
thogonal for any choice of m. In such a case, m is a model parameter that can be chosen to distinguish 
the m slow processes of interest from the fast processes T^fast ("7") that are not of interest. For initial 
densities p € span{/i, Z 2 , Z m }, the term for fast processes in Eq. (|1.2[) vanishes and the propagation 
can be performed exactly using only the dominant eigenfunctions. Note that Eq. Q1.2p represents 
the formulation of the dynamical process problem as a multi-scale problem in time: /i represents the 
timescale 00, t-z — k^ 1 = — r/ln A2 is the slowest dynamical process, etc. This formulation thus allows 
essential characteristics of the dynamical process to be described by treating its scales separately. The 
relation between dominant eigenvalues, exit times and rates, and metastable sets has been studied by 
asymptotic expansions in certain smallness parameters as well as by functional analytic means without 
any relation to smallness parameters |23 [ \12 \ |46" 1 [4| 15] . In particular, pHHH] established fundamental 
relations between the eigenvalues and -functions of V and metastable sets. 

The task is now to approximate the propagator's dominant eigenfunctions /i, I2, Z3, ... and eigenvalues A^. 
In other disciplines, variational principles have been worked out in order to approximate eigenfunctions 
and eigenvalues of known operators such as a quantum- mechanical Hamiltonian |43) . In contrast, for 
many complex dynamical systems, V(t) is either not known explicitly, or not available in a form that 
can be transformed into Eq. (|1.2|) . Instead, V(r) is given implicitly through stochastic realizations of 
the process z t . Therefore, a variational principle is sought that allows eigenfunctions /1, Z2, Z3, ■•• and 
eigenvalues Ai to be approximated through statistical observables of z t . Such a variational principle 
will be formulated in the present paper. 

This problem has extensively been studied for specific functional forms of the eigenfunctions Zj. When 
li are approximated via characteristic functions on a given set decomposition of state space, i.e. Zj/i -1 € 
span{ls 1 , ls„} with Sj C f2, the problem of finding the best approximation to eigenfunctions and 
eigenvalues is solved by a Markov model or Markov state model (MSM) (35J gj QH EH 123 H2 ■ In 
another version of MSMs, Zj are approximated by committor functions between a few pre-defined 
"cores" that form a non-complete subset of state space (8j [13j [36] . Basis functions that form a partition 
of unity are used in [49] . Markov state models have been recently used a lot to model molecular 
dynamics processes, especially in conjunction with large amounts of distributedly simulated trajectories 
[421 |4"UI [Ml (HI (2H|- Applications include conformational rearrangements and folding of peptides, 
proteins and RNA [51 [2j|l [H [2H1 S3 HZ| ■ In this application area, MSMs have had significant impact 
because they can be estimated from relatively short simulation trajectories and yet allow the system 
behavior to be predicted at long timescales. 

Despite this success, significant challenges remain. For example, in most current applications, the 
discretization of state space is done heuristically via a Voronoi partition of state space obtained from 
clustering available data points. The ability to construct a state space discretization adaptively would 
tremendously aid the construction of MSMs that are precise while avoiding the use of too many states. 
Such an adaptive discretization must be guided by an objective function that somehow measures 
the error made by the model. A bound to the MSM discretization error has been derived in [34 . 
However, this error is not suitable to design a constructive discretization approach as its evaluation 
requires knowledge of the exact eigenfunctions. The variational principle derived in this paper only 
uses statistical observables and is henceforth the basis for such a constructive approach to adaptive 
discretization. Furthermore, it is shown that existing Markov modeling approaches can be understood 
as a constrained optimal solution of the variational principle via a Ritz or Roothan-Hall method 
with different choices of basis sets. Based upon this formulation, further development of basis sets 
appropriate for describing complex molecular conformation dynamics can be made. 
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The article is organized as follows: In Sec. [5] a variational principle is formulated where the dominant 
eigenfunctions of stochastic dynamical systems are approximated by maximizing a Rayleigh coefficient, 
which - in the limit of the exact solution - is identical to the true eigenvalues. This Rayleigh coefficient 
is linked to the computation of correlation functions that can be evaluated without explicit knowledge 
of the propagator V . Sec. [3] makes considerations which types of functions may be practically useful to 
construct an approximation to Eq. (|I.2p in complex systems. Sec. 2] shows numerical experiments on 
a diffusion process in a one-dimensional double- well potential. Sec. concludes this study and makes 
suggestions for the next steps 



2. VARIATIONAL PRINCIPLE FOR CONFORMATION DYNAMICS 



2.1. Basics. Let D, be a state space, and let us use x, y to denote points in this state space. We 
consider a Markov process zj on fi which is stationary and ergodic with respect to its unique stationary 
(invariant) distribution /x(x) = p(z t = x) Vt. The dynamics of the process z t are characterized by the 
transition density 

(2.1) p(x,y; r) = p{z t+T = y | z t = x), 

which we assume to be independent of the time t. The correlation density, i.e., the probability density 
of finding the process at points x and y at a time spacing of r, is then defined by 

(2.2) C(x,y; r) = |«(x)p(x,y; r) = p(z t+T = y, z t = x). 

We further assume z f to be reversible with respect to its stationary distribution, i.e.: 

(2.3) /x(x)p(x,y; r) = /i(y)p(y,x; r) 

(2.4) C(x,y;r) = C(y,x; r). 

Reversibility is not strictly necessary but tremendously simplifies the forthcoming expressions and 
their interpretation |34] , In physical simulations, reversibility is the consequence of the simulation 
system being in thermal equilibrium with its environment, i.e. the dynamics in the system is purely a 
consequence of thermal fluctuations and there are no external driving forces. 

If, at time t = 0, the process is distributed according to a probability distribution po, the corresponding 
distribution at time r is given by: 

(2.5) p T (y) = / ebcp (x)p(x,y; r) =: V(r)p . 

Jq 

The time evolution of probability densities can be seen as the action of a linear operator V(t), called 
the propagator of the process. This is a well-defined operator on the Hilbert space L 2 _ 1 (ft) of functions 
which are square- integrable with respect to the weight function fx . The scalar-product on this space 
is given by 

(2.6) (u\v) != / dxu(x)w(x)^ 1 (x). 

If we assume the transition density to be a smooth and bounded function of x and y, the propagator 
can be shown to be bounded, with operator norm less or equal to one. Since the stationarity of p, 
implies V{t)\x = p., we even have HT^t)) = 1. Reversibility allows us to show that the propagator 
is self-adjoint and compact. Furthermore, using the definition of the transition density, we can show 
that V(t) satisfies a Chapman- Kolmogorov equation: For times tx,t 2 > 0, we have 

(2.7) V(t 1 +t 2 )=V(ti)V(t 2 ). 

2.2. Spectral decomposition. It follows from the above arguments that V(r) possesses a sequence 
of real eigenvalues Aj(r), with |Aj(r)| < 1 and |Aj(r)| — > 0. Each of these eigenvalues corresponds to an 
eigenfunction 6 L 2 The functions U form an orthonormal basis of the Hilbert space L 2 _ 1 (H,). 
Clearly, Ai(r) = 1 is an eigenvalue with eigenfunction ?j = /i. In many applications, we can assume 
that Ai(t) is non-degenerate and —1 is not an eigenvalue. Additionally, there usually is a number m 
of positive eigenvalues 

(2.8) l = A 1 (T)>A 2 (r)>...>A m (r), 

which are separated from the remaining spectrum. Because of the Chapman-Kolmogorov equation, 
each eigenvalue Ai(r) decays exponentially in time, i.e. we have 

(2.9) Ai(r) =exp(- Kl r) 
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for some rate Ki > 0. Clearly, k\ = 0, Ki,...,K m are close to zero, and all remaining rates are 
significantly larger than zero. If we now expand a function u € L in terms of the functions 
i.e. 



(2.10) u = J2(u\h)^-ih 



we can decompose the action of the operator V(t) into its action on each of the basis functions: 

■p(t)u = YZi(u\k)^v{T)k 

(2.11) = YZ l K{T)(u\k)^ 1 k 

= E^i ex P(- K i T ) ( u I h)^-i h- 

For lag times r ^ - 1 , all except the first m terms in the above sum have become very small [34], 
and to a good approximation we have 

m 

(2.12) V{t)u « exp(-^r) (u \ k)^ k. 

i=i 

Knowledge of the dominant eigenfunctions and eigenvalues is therefore most helpful to the understand- 
ing of the process. 



Remark 1. Instead of the propagator V(t), one can also consider the transfer operator T(t), defined 
3< 



for functions u £ L 2 A£i) by: 



(2.13) T{t)u(y) = -L- [ djgj(x,y;r)A*(x)u(x). 
Using the unitary multiplication operator A4 : L^Cl) M> L^_ 1 (f2), defined by 

(2.14) .Mit(x) = /x(x)u(x), 
we have 

(2.15) P(t) =XT(t)M- 1 , 

and consequently, the transfer operator inherits all of the above properties from V(t). In particular, 
there is a sequence of eigenfunctions 

(2.16) r i =^ 1 l i 

of 7"(t), corresponding to the same eigenvalues Ai(r), which arc normalized w.r.t. to the scalar-product 
weighted with \x instead of [iT 1 . Especially, we have T\ = fj, fx—l . The two operators can be treated 
as equivalent, and all of the above could have been formulated in terms of T(t) as well. 

2.3. Rayleigh variational principle. In nontrivial dynamical systems neither the correlation densi- 
ties p(x, y; t) and C(x, y; r) nor the eigenvalues Aj(r) and eigenfunctions are analytically available. 
This section provides a variational principle based on which these quantities can be estimated from 
simulation data generated by the dynamical process z t . For this, the formalism introduced above 
is used to formulate the Rayleigh variational principle used in quantum mechanics |43) for Markov 
processes. 

Let / be a real-valued function of state, / = /(x) : il — >• M. Its autocorrelation with respect to the 
stochastic process z t is given by: 

(2.17) acf(/;r) = E[/(z ) /(z T )] = [ [ dx dy /(x) C(x, y; r) /(y) = (V(r)fif | pf)^ . 



x Jy 



In the Dirac notation often used in physical literature, integrals such as the one above may be abbre- 
viated by E[/(x ) /(x T )] - (fxf \ V(t) 
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Theorem 2. The autocorrelation function of a weighted eigenfunction = n 1 is its eigenvalue 
A fc (r): 

(2.18) acf(r fe ; r) = E [r fe (z ) r k {z T )} = A fe (r). 
Proof. Using (|2.17p with / = /i -1 ^-, it directly follows that: 

acf(r fc ;r) = (V(r)l k \ h) „-i 

(2.19) = A fe (r) (l k | h)^ 

□ 

Theorem 3. Let I2 be an approximate model for the second eigenfunction, which is normalized and 
orthogonal to the true first eigenfunction: 

(2.20) = 

(2.21) (h,h),>-i = !• 
Then we find for f 2 — fi I2' 

(2.22) acf(f 2 ; r) = E [f 2 (z ) f 2 (z r )] < A 2 (r). 



Proof. The proof is an application of the Rayleigh variational method to the operator V{t). If i 2 is 
written in terms of the basis of eigenfunctions l\\ 

00 

(2.23) l2=J2 aill > 

i=2 

where a± = because of the orthogonality condition, we find: 



acf(r 2 ;r) = (v(t% I h) _ 

(2.24) = J2Z^ a ^ a J X d T ) (h Ih)^ 

E 2 = 2 a ?^(r) 
< A 2 (r)E,= 2 a? 

A 2 (r). 

The pre-last estimate is due to the ordering of the eigenvalues, and the last equality results from the 
normalization condition l2.21l and Parseval's identity. □ 

Corollary 4. Similarly, let l k be an approximate model for the fc'th eigenfunction, with the normal- 
ization and orthogonality constraints: 

(2.25) {i k ,h}^ - 0, Vi<k 

(lk,h)p,-i = 1, 

then 

(2.26) acf(f fc ; r) = E [f fe (z ) r fe (z T )] < A fc (r). 
The proof is analogous to Theorem [3] 

Remark 5. Improved estimates than those of [5] to S] have been obtained by [22J for characteristic 
functions. With a re-definition of the terminology, they can be directly transferred to the case of 
mutually orthonormal basis functions. It would be interesting to study the applicability of these 
results to more general cases. However, obtaining such estimates is not the focus of the present paper. 

Remark 6. The variational principle given by Theorems © to (j?)) is fulfilled for l k with k > 2 only if 
the k — 1 dominant eigenfunctions are already known. 

In particular, the first eigenfunction, i.e. the stationary density must be known. In practice, these 
eigenfunctions are approximated via solving a variational principle. Nonetheless, some basic statements 
can be made even if no eigenfunction is known exactly. For example, it is trivial that when the 
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estimated stationary density fi is used in Theorem [5J then the estimate of the first eigenvalue is still 
always correctly 1: 

(2.27) acft/r 1 /!; r) = acf(l; r) = 1 
and from theorems [5] and [3] it follows that any function pt 

(2.28) acf(f fe ; r) < 1 

hence the eigenvalue 1 is simple and dominant also when estimating eigenvalues from data. 

Remark 7. An important insight at this point is that a variational principle of conformation dynamics 
can be formulated in terms of correlation functions. In contrast to quantum mechanics or other 
fields where the variational principle has been successfully employed, no closed-form expression of 
the operator V{t) is needed. The ability to express the variational principle in terms of correlation 
functions with respect to V(t) means that the eigenvalues to be maximized can be directly estimated 
from simulation data. If statistically sufficient realizations of z t are available, then the autocorrelation 
function can be estimated via: 

(2.29) acf(f fc ; r) = E(r fe (z )f fc (z T )) w — f fe (z )r fc (z r ), 

where N is the number of simulated time windows of length t. We will try to use this in the application 
of the method. 

2.4. Ritz method. The Ritz method is a systematic approach to find the best possible approxima- 
tion to the m first eigenfunctions of an operator simultaneously in terms of a linear combination of 
orthonormal functions [32J . Here the Ritz method is simply restated in terms of the present notation. 
Let Xi '■ ^ — > K, i £ {1, m} be a set of m orthonormal basis functions: 

(2.30) {Xi,Xo)^=kj, 
and let x denote the vector of these functions: 

(2.31) x(x) = [xi(x),..., Xm (x)] T . 
We seek a coefficient matrix B 6 R roxm , 

(2.32) B= [b!,...,b m ] 

with the column vectors = [bn, bi m ] T that approximate the eigenfunctions of the propagator as: 

m 

(2.33) h(x) = bf X (x)=X>iX*(*), 

3=1 

with respect to the constraint that the functions are also normalized. It turns out that the solution 
B to the eigenvalue equation 

(2.34) HB = BA, 
with individual eigenvalue/eigenvector pairs 

(2.35) Wo i =b i X i , 
and the density matrix H = [foy] defined by: 

(2.36) h %0 = f /'dxdy / !- 1 (x)x,(x)C(x,y; r) ^ 1 (y)x J (y) 

(2.37) = El/i-^M^XjML 



yields the desired result. More precisely, the eigenvector bi corresponding to the greatest eigenvalue Ai 
from 12.351 contains the coefficients of the linear combination which maximizes the Rayleigh coefficient 
among the functions Xii an d this maximum is given by Ai. Consequently, Ai should be as close as 
possible to Ai = 1, and the function generated from bi should model the stationary density l\. But 
furthermore, the remaining eigenvalues and eigenvectors generated from (|2.34j) can be used as estimates 
of the other eigenvalues A2 , . . . , A m : 

Corollary 8. The second estimated eigenvalue A2 from i2.35\) satisfies A2 < A2. 
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Proof. First of all, note that (I2 \ li ) =0 by the orthogonality of the eigenvectors of the matrix 
H. For the same reason, we find that: 



m 

= bub2jhij 



A 2 ^ 6it&2i 



i=l 



(2.38) = 0. 



Now, let Z = xl\ +1/I2 be a linear combination of the first two model eigenfunctions which is normalized 
such that 



1 = { l\V 

(2.39) = x 2 +y 2 . 

Using (I2.38P and (|2.39p . computing the Rayleigh coefficient of I results in: 

(V{r)l I 1)^ = x 2 (V(r)h I h)^ + y 2 (p(r% \ h) ^ 

= x 2 X 1 +y 2 X 2 

(2.40) = Ai- y 2 (Ai-A 2 ). 

which is bounded from below by A2. Clearly, there is a normalized linear combination I of l\ and I2 
which is orthogonal to l\. By (I2.40p and the variational principle, we conclude that: 



A 2 < (7>(t)Z ] 



< A 2 . 

□ 

Remark 9. Due to the equality between Eq. (|2.36[) and (|2.37[) the elements of the H matrix can be 
estimated as correlation functions of a simulation of the process z t , as mentioned above, provided that 
a sufficient approximation of /i is at hand. 

2.5. Roothaan-Hall method. The Roothaan-Hall method is a generalization of the Ritz method 
used for solving the linear parameter optimization problem for the case when the basis set is not 
orthogonal [H1QI1. Let the matrix S e W nxm with elements 

(2.41) St^ixuXi)^ 

be the matrix of overlap integrals with the normalization conditions Su = 1. Note that S has full rank 
if and only if all Xi are pairwise linearly independent. The optimal solutions b; in the sense of Eqs 
(|2.32j) - ()2.33j) are found by the eigenvectors of the generalized eigenvalue problem: 

(2.42) HB = SBA 
with the individual eigenvalue/eigenvector pairs: 

(2.43) Hbi = SbiA,. 



Remark 10. The Ritz and Roothaan-Hall methods are useful for eigenfunction models that are ex- 
pressed in terms linear combinations of basis functions. Non-linear parameter models can also be 
handled with nonlinear optimization methods. In such nonlinear cases it needs to be tested whether 
there is a unique optimum or not. 
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2.6. Markov state model. As an example, let {S\, S n } be pairwise disjoint sets partitioning ft 
and let 7Tj = J s dx/i(x) be the stationary probability of set Sj C £1. Consider the piecewise constant 
functions 

(2.44) Xl = -Ll Si 

where ls < is the characteristic function that is 1 for x e Si and elsewhere. Since Si n Sj = for all 
i 7^ j these functions form a basis set with {XiiXj)ti = Therefore, we can directly use them as a 
model for the transfer operator eigenfunctions r^. Evaluation of the corresponding H matrix yields: 



(2.45) hij = — = I I dxrfyl 5s C(x,y;r)l Si 



1 



^/nWj j x' ./ x 

( ' i i 



/ dxdy C(x,y; r) 
is 4 




where Cy = P(z t+T G Sj,z t G Sj) is the joint probability of observing the process in sets Sj and Sj 
with a time lag of r while = P(z t+T G Sj | z f G Sj) is the corresponding transition probability. 
Thus, computing the optimal step-function approximation to the true eigenfunctions = (i li and 
eigenvalues Aj(r) via the Ritz method is the same as computing eigenvalues and eigenvectors of the 
Markov model transition matrix T = [Tij] and scaling them appropriately. This conclusion can also 
be obtained from Ref. [Ml via a different route. 



3. Modeling 

Section [5] has provided a general variational principle for approximating the dominant eigenvalues and 
eigenfunctions of Markov processes. In order to apply this principle to complex systems, a useful level 
of modeling the eigenfunctions in terms of basis functions needs to be found, and appropriate classes 
of basis functions must be identified. This sections attempt a first approach to this problem by making 
general considerations for what modeling schemes might be appropriate. 

3.1. Half- weighted eigenfunctions. Is it beneficial to directly model the propagator eigenfunctions 
Ik, their weighted counterparts — ^ x lk or rather yet another set of functions? We would like to 
use a model that has the following properties: 

(1) As basis functions Xi it is preferred to use local functions, i.e. either functions with compact 
support, or at least with the property limixi^oo Xi( x ) ~> 0- Such locality is useful to direct the 
computation effort to specific regions of state space and may aid the adaptive refinement of the 
eigenfunction approximation by specifically adding basis functions that add local refinements. 
Since we also aim at modeling eigenfunctions as linear combinations of basis functions we 
cannot use the eigenfunctions that are not local. 

(2) We would like to be able to pre-compute as many expressions as possible analytically. When 
using appropriate basis functions Xi an d Xj it may be possible to calculate analytic solutions of 
the integrals (xi, Xj)i albeit this feature is usually destroyed when weighting with the stationary 
density as in {xiiXj) ■ Therefore we will also avoid using the eigenfunctions Ik that would 
require such a weighting. 

Consider a rewrite of Eq. (|2.17p as: 

acf(r fc ; t) = I I dx dy ^ (x) Z fe (x) ^5 (x) C*(x, y; r) pT^ (y) yT^ (y) Z fe (y) 

Jx Jy 

(3.1) = f /' C Zxdy0 fc (x)S(x,y; r) fc (y) 

JxJy 



where the "half-weighted" eigenfunctions: 

(3.2) <Mx) 



*i(x) 
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and the "half-weighted" correlation density: 
(3.3) 5(x,y;r)- 



^2( X )^2(y) 

have been defined. When now modeling the half-weighted eigenfunctions fa using some basis set, the 
following nice properties are obtained: 

(1) Local basis functions can be used. This follows from limixi^oo fa(x) = limix^oo ^(x)r.;(x) — > 


(2) The normalization condition requires a non- weighted scalar product: 

(3.4) (U,lj)p-* = (Jrj^^^Tj2 

= (fa,<t>j) = 

(3) When (Xi,Xj) is analytically computable and fa = X)i c iXi; then {fa,fa) is also analytically 
computable. 

(4) The first half-weighted eigenfunction has eigenvalue 1 and is identical to the half-weighted 
stationary density 

(3.5) fc(x) = = Kx) V 2 . 

(5) When models of /i(x) 1 / 2 and fa are available, the Rayleigh coefficient in Eq. (|3.ip can be 
estimated numerically as the autocorrelation of 

(6) When defining a propagator V* in half- weighted space via: 

(3.6) p T (y) = P(r)o P0 (x) 



My) = / dxp(x)p(x,y; T ) 

J x 



M(y) 1/2 L M x ) 1/2 My) 1/2 
My) = / dx M x )M x >y; r ) 

= P*(t)o^(x) 
then is self- adjoint and has orthogonal eigenfunctions 

(3.7) fa\ = V*{r)fa 

It follows from theorem ([2]) that the exact eigenvalues are calculated by the Rayleigh coefficients of 
the exact eigenfunctions: 

(3.8) Aj = (ji~%fa I C I n~^4>i^ = adiQrifa; T ) 

while they are approximated from below by the Rayleigh coefficients of the approximate eigenfunctions: 

(3.9) A l > A 4 = I C I M"^) = acf(^-2^; r). 

This Rayleigh-coefficient can be directly sampled: for a given trajectory z t , it can be estimated as: 

(3.10) Ai > A, = E t /t(zt)"20 i ( Zt )^( Zt+r )"2 ( ^-( Zt+r ) 

Thus, for a given trajectory z t , the optimal eigenfunctions fa can be calculated by maximizing the 
Rayleigh coefficient, using e.g. the Ritz or the Roothaan-Hall method. 

3.2. Gaussian Basis functions. In complex dynamical processes such as molecular dynamics of 
biomolecules, one has to devise basis sets that can be evaluated in high dimensions. While the present 
work provides merely a starting point for identifying appropriate basis sets that go beyond common 
choices such as the step function basis or the committor basis, we here suggest a possible choice that 
is potentially applicable to the molecular dynamics setting. Empirically, it has been found that the 
stationary densities of biomolecules in the essential subspace is often clustered [TB]. Therefore, we put 
forward the idea that /z(x) 1 / 2 and the other half- weighted eigenfunctions can be approximated by a 
Gaussian mixture. 

Let us thus assume that the state space f2 is a metric space with distance d(x, y), and let us model 
the half- weighted invariant density /i(x) 1 / 2 in terms of Gaussian basis functions: 



A VARIATIONAL APPROACH TO MODELING SLOW PROCESSES IN STOCHASTIC DYNAMICAL SYSTEMS 10 



(3.11) AM 1 / 2 = £ a. exp 

where <n £ R, y, G and er € R are amplitude, mean and shape parameters. The invariant density 
can then be analytically given: 

(3.12) A(x) = (^) 1/2 ) 2 = (Z^ x p(-%^)) 

a, cxp I -5— 1 + ^ 2a, flj cxp I — 2 J - 1 . 

i ^ ' i<j ^ ' 

Furthermore, consider that the half-weighted eigenfunctions be given in terms of the same Gaussian 
basis: 

d(x, y 4 



(3.13) </> fe (x) = X hi exp 



2a 2 



where must be appropriately chosen to guarantee orthogonality with respect to the invariant density. 
The corresponding unweighted eigenfunctions f k are 

, , , , k(x) E l ^ ex p(-%^ 

(3.14) f fe (x)- ' 



which does not have a simple form, but can be evaluated point-wise. In order to enforce the normal- 
ization ((f>k,<t>i) — Ski we consider 

d(x,yi) + d(x,yj) 



(3.15) {(f>k,(i>l) = J dxXX &fc4 ^ exp 



2a 2 



» 3 

which can be analytically evaluated when is an Euclidean space. 

Using Gaussian Ansatz functions in half-weighted space may thus have important practical benefits. 
However, other basis sets, especially sets of other Radial basis functions than Gaussians may also be 
a good choice for high-dimensional systems that deserve further investigation. 

4. Numerical examples 

4.1. Metastable potential from a Gaussian stationary density. The example is chosen such 
that it is tractable by direct grid discretization so as to be able to generate a reference solution. 
Different optimization methods for the variational problem and choices of basis sets are considered 
and illustrated. 

Let f2 = R be our state space with points First, a "Gaussian hat" function is defined via: 

( x-q) 2 s 
2s 2 



(4.1) gh(x; a, s) := exp I — 



where a e R is the mean and set the standard deviation. We define a stationary density from two 
Gaussians: 

fi{x) := -^=(gh{x,-2,l)+gh(x,2,l)) 
2v 7r 

The corresponding dimensionless generating potential is given by 

(4.2) U(x) = -ln(»(x)) 
which exerts a force on a particle at position x of: 

(4.3) /(*) = -VU(x) = ^l»(x). 

Using Smoluchowski dynamics and Euler discretization, a time-step x A y is given by 

(4.4) y = x + tJ(x) + V2tt] 
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Figure 4.1. Two-well potential with Smoluchowski dynamics, r = 0.025. (a) Double- 
Gaussian density fi(x), (b) the corresponding potential U{x), (c) the half-weighted 
density <\>\ (x) = y/ fi(x), (d) slowest- process eigenfunction <j>2 (x) from direct numerical 
solution. 



where r\ is a normally distributed random variable (white noise). The transition density can hence be 
written as: 

(4.5) p(x, y- t) = N y {x + rf(x), V27) 
and the correlation density is given by: 

(4.6) C(x,y;r) = n(x)p(x,y;T) 

Now we are concerned about estimation of eigenf unctions. The first half- weighted eigenfunction is the 
square root of the stationary density: 

(4.7) fa = y/jfx) 

such that 4>\{x) = fi(x). We here assume n{x) to be known, although in practice it must be estimated. 

4.2. Ritz method with characteristic functions (Markov state model). We aim at approxi- 
mating the eigenvalues and eigenfunctions of the true propagator via the Ritz method described in 
Sec. 12.41 Here, a basis set x — {Xi( x )> •••! Xn{x)) t consisting of N =20 characteristic functions in the 
range x £ [—6, 6] defined by: 

(4-8) Xi = ~ J=^l[-6+0.6i, -5.4+0.6i] 



where lr w is the characteristic function that is 1 on the interval [a,b] and outside, and 7r^ = 
/ n(x)dx is the stationary probability of the set Si = [—6 + 0.6i, —5.4 + 0.6i]. The corresponding 
density- matrix H = [hij] £ K. NxN defined by (Dirac notation): 

(4.9) hij =( X i\C\ Xj ) 

takes the form 



(4.10) hij = 



■Ki j s dx j s dy C(x,y; r) 



as in Sec. 

The H matrix was calculated by direct numerical integration using Mathematica and the eigenvalue 
problem was subsequently solved, yielding the optimal coefficient vectors C! and c 2 that provide the 
approximations (j>\ w <j>\, 02 ~ 4>2 and Ai ~ A2, A2 ~ A2. The results are given in Fig. 14.21 a and 
b, indicating that the eigenvalues are approximated to two significant digits while the eigenfunctions 
retain a significant discretization error that arises from the step-function basis used. 
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FIGURE 4.2. Approximation to the eigenfunction shown in Fig. 14.11 using different 
methods and basis sets. The true eigenvalues are 1.0, 0.998913. The reference solutions 
are shown in red dotted lines while the approximations are shown in black solid lines. 
a,b) MSM / Ritz method with 20 characteristic functions in the range x € [—6,6]. 
Eigenvalues 1.0, 0.980384. c,d) Ritz method with a basis set of 20 Hermite functions, 
Eigenvalues 1.0, 0.998913. e,f) Roothaan-Hall method with a basis set of 11 Gaussians 
Eigenvalues 1.0, 0.995507. 



4.3. Ritz method with a Hermite basis. In order to arrive at a smooth solution we employ the 
Ritz method with a smooth orthogonal function basis. Here, we choose the Hermite functions, defined 
by: 

(4.11) = (-iy(2H!^)- 1/2 e* 2 / 2 ^-e-* 2 . 

ax 1 

The Hermite functions are local (limix^oo Y;(x) — > 0) and are thus useful to model the behavior of the 
eigenfunctions (j)k where the stationary density is significantly larger than zero. The basis functions 
are defined to be the normalized Hermite functions with 

such that (xi,Xj) = 

Using the basis set x = [Xoi — j Xi9] T j the H matrix was calculated by direct numerical integration 
using Mathematica and the Ritz method was used to approximate eigenvalues and eigenfunctions of 
the propagator. As shown in Fig. 14.2b and d, a nearly perfect approximation of both eigenvalues and 
eigenfunctions is obtained even though the number of basis functions used is identical to the MSM 
approach of the previous section. However, the MSM approach has the advantage that it can be 
employed in high-dimensional spaces which is not the case with Hermite basis functions. 

4.4. Roothaan-Hall method with a Gaussian basis. In order to have a hope to solve high- 
dimensional problems, one must resort to simple basis functions, ideally ones with analytical properties 
that can be practically evaluated in high-dimensional spaces. Therefore, we here suggest the use of 
Gaussian basis functions as described in Sec. 13.21 In the one-dimensional case, the Gaussians used 
are: 



(4.13) ghi(x) = exp 



2ct 2 
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FIGURE 4.3. Nonlinear optimization of the propagator eigenfunctions. (a) value of A2 
depending on y 2 and with fixed s 2 — 0.8. (b) value of A2 depending on s 2 with fixed 
2/2 = 1) (c) Approximation to 4>2 with 7/2 = 1, s 2 = 0.8 (black). Reference solution for 
(f> 2 (red). 



Here, we use a = 1 and j/j = (—5, —4, ...,4,5). Gaussian basis functions are not orthogonal. We 
therefore calculate the overlap matrix S = [s^] with 

(4.14) Sij = (ghi,ghj) 

(x - yif + (x - yj) 2 



that can be evaluated analytically. The H = [h a b] matrix is again defined by (Dirac notation) : 

(4.15) hob = (gK I C I gh b ). 

Using the Roothaan-Hall method (Sec. 12. 5[) . the best approximation to the propagator eigenfunctions 
(pi — (bj , x) are found by the eigenvectors of the generalized eigenvalue problem 

(4.16) S^Hb, = A 4 b,. 

As shown in Fig. 14.21 c and d, an also nearly perfect approximation of both eigenvalues and eigen- 
functions is achieved even though the basis is smaller than the previous basis sets. Since the Gaussian 
basis set is a good candidate for being used in high-dimensional spaces, this is probably the most use- 
ful result so far. Note that the matrix inversion S _1 can be efficiently calculated with sparse matrix 
methods when a cutoff is used to set nearly-non-overlapping pairs with h a b ~ to 0. 



4.5. Nonlinear optimization. The previous methods used exclusively linear combinations of basis 
functions. A greater degree of freedom in approximating the propagator eigenfunctions is achieved 
by using additional shape parameters in the basis functions. This, however, leads to a nonlinear 
optimization problem that is in general difficult to solve and may have multiple optima. However, we 
briefly illustrate the approach on our one-dimensional example. We make the Ansatz for the second 
half-weighted eigenfunction: 

(4.17) 4> 2 {x) = -= (-gh{x,y 2 ,s 2 ) + gh(x,y 2 ,s 2 )) 



Z = 



-1 1/2 

dx [-gh{x, y 2 , s 2 ) + gh(x, y 2 , s 2 )) 



The normalization constant makes sure that {4> 2l 4> 2 ) — 1. The constraint {4> 2 ,4>\) = is here ensured 
by the fact that <f>\ is an even function and <p 2 is an odd function. 

The optimal parameters y 2 and s 2 are found by maximizing the Rayleigh coefficient: 

(a 1 o \ f *\ H>2{y2, s 2 ) (j) 2 (y 2 , s 2 )\ 
(4-18) (V2,s 2 ) = argmax( m — \C\ m — ). 

1/2,52 \ H L ' Z I 

Fig. 14.31 shows the results of varying y 2 and s 2 as well as the local optimum for 1/2 = 1 an d s 2 = 0.8. 
In this case, a good approximation to the eigenvector could be achieved with a 2-term ansatz function. 
However, the general usefulness of the nonlinear approach for high-dimensional problems remains to 
be evaluated. 
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4.6. Quartic potential. As a second numerical example, we use the diffusion in a one-dimensional 
quartic potential: 

(4.19) V(x) = 3a; 4 - 6x 2 + 3, 

which has two minimum positions at x = ±1. We seek to estimate the second dominant eigenvalue 
A2(t) and the corresponding time scale t<i = — log ^ 2 ( r ) by applying the Roothaan-Hall method above 
with Gaussian functions. We will then compare the results to those obtained from a Markov state 
model discretization. First, we generate a sample trajectory of the process as in Eq. (I4.4[) . Here, we 
used a time step r = 10~ 3 and a total number of steps N = 10 7 . From this sample, we computed 
an estimate jl of the stationary density. We then computed the Markov state model transition matrix 
and its eigenvalues, using a discretization of the state space into 100 sets. 

For the application of the Roothaan-Hall method, we picked thirteen Gaussian functions fa with centres 
at 

(4.20) x = -2,-1.5,-1.2,-1,-0.8,-0.5,0,0.5,0.8,1,1.2,1.5,2. 

The variances were set to 1 for the functions centred at x = —2, —1.5, 0, 1.5, 2, and to 0.5 for all others. 
Those functions were used as half- weighted basis functions, meaning that we computed the entries of 
the H-matrix according to 

^ N—'tn 

(4.21) hij = — fr~ 1/2 (x k )fa(x k )p,~ 1/2 (x k+m )(i>j(x k+m ) ) 

fe=i 

where m is an integer corresponding to the lag time tut. We similarly estimated the S-matrix and 
then solved the generalized eigenvalue problem Eq. (|2.42l) . 

The results displayed in Figure 14.41 show that we not only get a good approximation of both the 
first and second eigenfunction in terms of smooth functions, but most importantly, the second largest 
eigenvalue Aa(r) and the corresponding time scale ti can be estimated comparably well with both 
methods. While 100 sets were used for the MSM discretization, only thirteen basis function were used 
for the Roothaan-Hall method. 

5. Conclusions and outlook 

Here, we have formulated a variational principle for Markov processes where the dominant eigen- 
functions are approximated by maximizing a Rayleigh coefficient, which - in the limit of the exact 
eigenfunctions - is identical to the true eigenvalues. This is the formulation needed to attack the 
problem of estimating the slow processes in stochastic dynamical systems with a much wider method- 
ology than by the presently used class of Markov State Models. In particular, the entire toolbox of 
quantum mechanics where many decades of research have gone into the development of eigenfunction 
approximation methods for high-dimensional systems becomes available. 

From a practical point of view, a main achievement of the present study is that the Rayleigh coeffi- 
cient can be estimated from simulation data as it is equivalent to an autocorrelation function of the 
appropriately weighted test function. The autocorrelation estimates are such that they can be fed by 
many short simulations distributed across state space and do not require the direct simulation of the 
slow processes in a single long trajectory. This is an important advantage in dealing with the sampling 
problem that arises in simulating metastable dynamical systems. 

A main use of the present approach will be to facilitate the development of adaptive discretization algo- 
rithms of high-dimensional state spaces for the computational characterization of complex dynamical 
processes. The Rayleigh coefficient derived here represents a practically accessible and theoretically 
solid functional to guide such an adaptive discretization algorithm. In contrast to Markov state models, 
such an approximation approach does not necessarily need to use the same basis set for all eigenfunc- 
tions. Especially for reversible dynamics, different eigenfunctions can be approximated separately, thus 
possibly permitting the use of relatively small basis sets. 

For a given class of dynamical systems, a basis sets must be selected that is appropriate to model the 
regularity of the solution. For high-dimensional processes such as molecular dynamics, Gaussian basis 
functions might be a workable solution since they be well combined with clustering-based identifica- 
tion of center positions and permit the analytical calculation of some quantities such as the overlap 



A VARIATIONAL APPROACH TO MODELING SLOW PROCESSES IN STOCHASTIC DYNAMICAL SYSTEMS 15 




FIGURE 4.4. Application of the Roothaan-Hall method with Gaussian basis functions 
to a one-dimensional diffusion process, compared to a 100-set MSM discretization 
computed with the EMMA package [38 . a) The potential function V. b) Estimated 
stationary density l\ compared to the exact solution, c) Comparison of the second 
largest eigenvalue A2(r), estimated by the Roothaan-Hall method and an MSM, both 
plotted against the lag time in integer multiples of At. implied time scale t%, with 
the lag time given in integer multiples m of the simulation step r. d) The second 
eigenfunction li as estimated from both methods. 

integral. An interesting alternative approach is to build the Basis set upon weakly coupled subsets 
of internal molecular coordinates, as suggested in the mean field approach developed in Ref. (T7j. 
The usefulness of these and other approaches for complex molecular systems will be investigated in 
future studies. Furthermore, subsequent studies will deal with the error caused by the projection on 
a finite-dimensional subspace depending on the choice of basis functions, as well as with statistical 
considerations, such as the efficient evaluation of uncertainties of the estimated Rayleigh coefficients. 
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